Installing packages

It is best to install packages (e.g., install.packages("cowplot")) from the command line only. If you include this code in a R code chunk, you risk R trying to reinstall the package each time you knit your file.

Priors

When discussing prior knowledge related to Bayesian methods; is that the same as things like genomic selection where a Genomic Breeding Value is assigned based on training data sets to inform a model on how to handle the experimental data sets?

read_excel()

When I try to knit my code it does not want to complete a read_excel() command? Is this something I need to update? I worked around by making the excel file a .csv for now.

Themes

  • Loops and functions
    • This unit’s problem set
  • What is likelihood and what does it mean?
  • Likelihood vs. relative likelihood vs. probability
  • Grid approximation vs. optimization
  • How Bayesian methods work “under the hood”
    • More coming soon
    • Bayes rule

\[P[\Theta | Data] = \frac{P[Data | \Theta] \times P[\Theta]}{P[Data]}\]

Themes

  • Big picture of analytical vs. ML vs. resampling vs. Bayesian
    • What do we want you to understand about them?
    • More practical examples
    • When to use each
    • Does needing to use one imply bad design?

Mouse weaning data

M <- read_excel("../data/Mouse_Weaning_Data.xlsx") %>% 
  select(MouseID, Sex, WnMass) %>% 
  drop_na() %>% 
  mutate(Sex_f = factor(if_else(Sex == 0, "Female", "Male")))
str(M)
## tibble [2,429 x 4] (S3: tbl_df/tbl/data.frame)
##  $ MouseID: num [1:2429] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Sex    : num [1:2429] 0 1 1 1 1 0 0 1 1 1 ...
##  $ WnMass : num [1:2429] 15.3 12.5 17.4 16.6 16.8 ...
##  $ Sex_f  : Factor w/ 2 levels "Female","Male": 1 2 2 2 2 1 1 2 2 2 ...

Are the sex ratios equal?

After 3 generations

M_3 <- M %>%
  filter(MouseID != -9) %>% 
  filter(MouseID < 400) %>% 
  mutate(Sex_f = factor(if_else(Sex == 0, "Female", "Male")))
str(M_3)
## tibble [191 x 4] (S3: tbl_df/tbl/data.frame)
##  $ MouseID: num [1:191] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Sex    : num [1:191] 0 1 1 1 1 0 0 1 1 1 ...
##  $ WnMass : num [1:191] 15.3 12.5 17.4 16.6 16.8 ...
##  $ Sex_f  : Factor w/ 2 levels "Female","Male": 1 2 2 2 2 1 1 2 2 2 ...

Are the sex ratios equal?

M_3 %>% group_by(Sex_f) %>% tally()
## # A tibble: 2 x 2
##   Sex_f      n
## * <fct>  <int>
## 1 Female    91
## 2 Male     100

Model Likelihood (\(\mathcal{L}\))

For a set of observations (\(Y_i\)) and hypothesized parameters (\(\Theta\)) the model likelihood is the product of the observations’ individual likelihoods:

\[\mathcal{L}\left(\left\{ Y_{i}\right\} _{i=1}^{n};\Theta\right) = \prod_{i=1}^{n}Pr\left[Y_{i}; \Theta\right]\]

Evaluate the likelihood function for different values of \(\Theta\) to estimate \(\mathcal{L}\) for different sets of \(\Theta\).

Model Likelihood (\(\mathcal{L}\))

\[\mathcal{L}\left(\left\{ Y_{i}\right\} _{i=1}^{n};\Theta\right) = \prod_{i=1}^{n}Pr\left[Y_{i}; \Theta\right]\]

\[\log\left(\mathcal{L}\left(\left\{ Y_{i}\right\} _{i=1}^{n};\Theta\right)\right) = \sum_{i=1}^{n} \log\left(Pr\left[Y_{i};\Theta\right]\right)\]

So we just need to sum the log-likelihoods of the observations to get the model likelihood.

Likelihood across the range of 0 to 1

Generate a range of possible values for \(\theta\):

theta <- seq(0.001, 0.999, length = 200)

Calculate the probability for 91 female mice from 191 total mice for each value of \(\theta\):

pr <- dbinom(91, 191, prob = theta)

Convert to log-likelihoods:

log_lik <- log(pr)
log_liks <- tibble(theta, log_lik)

log_liks
## # A tibble: 200 x 2
##      theta log_lik
##      <dbl>   <dbl>
##  1 0.001     -499.
##  2 0.00602   -337.
##  3 0.0110    -282.
##  4 0.0160    -248.
##  5 0.0211    -224.
##  6 0.0261    -205.
##  7 0.0311    -190.
##  8 0.0361    -177.
##  9 0.0411    -165.
## 10 0.0461    -155.
## # ... with 190 more rows

Likelihood across the range of 0 to 1

Maximum likelihood

Maximum likelihood

max(log_lik)
## [1] -2.852506
theta[which.max(log_lik)]
## [1] 0.4774322
91 / 191
## [1] 0.4764398

Probability of 91 or fewer female mice

Assuming \(\theta = 0.5\)

sum(dbinom(0:91, 191, 0.5))
## [1] 0.2813992

Pr[Female] for all mice

M %>% group_by(Sex_f) %>% tally()
## # A tibble: 2 x 2
##   Sex_f      n
## * <fct>  <int>
## 1 Female  1202
## 2 Male    1227
pr <- dbinom(1202, 1202 + 1227, prob = theta)

Convert to log-likelihoods:

log_lik <- log(pr)
log_liks <- tibble(theta, log_lik)

log_liks
## # A tibble: 200 x 2
##      theta log_lik
##      <dbl>   <dbl>
##  1 0.001      -Inf
##  2 0.00602    -Inf
##  3 0.0110     -Inf
##  4 0.0160     -Inf
##  5 0.0211     -Inf
##  6 0.0261     -Inf
##  7 0.0311     -Inf
##  8 0.0361     -Inf
##  9 0.0411     -Inf
## 10 0.0461     -Inf
## # ... with 190 more rows

Likelihood across the range of 0 to 1

Maximum likelihood

Maximum likelihood

max(log_lik)
## [1] -4.1509
theta[which.max(log_lik)]
## [1] 0.4924774
1202 / (1202 + 1227)
## [1] 0.4948538

Mouse weaning data

M <- read_excel("../data/Mouse_Weaning_Data.xlsx") %>% 
  select(MouseID, Sex, WnMass) %>% 
  drop_na() %>% 
  mutate(Sex_f = factor(if_else(Sex == 0, "Female", "Male")))
str(M)
## tibble [2,429 x 4] (S3: tbl_df/tbl/data.frame)
##  $ MouseID: num [1:2429] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Sex    : num [1:2429] 0 1 1 1 1 0 0 1 1 1 ...
##  $ WnMass : num [1:2429] 15.3 12.5 17.4 16.6 16.8 ...
##  $ Sex_f  : Factor w/ 2 levels "Female","Male": 1 2 2 2 2 1 1 2 2 2 ...

Mouse weaning data

M %>% 
  ggplot(aes(WnMass, color = Sex_f)) +
  geom_line(stat = "density", size = 1.5) +
  facet_grid(Sex_f ~ .) +
  scale_color_manual(values = c("Orange", "Purple"), name = "Sex") +
  labs(x = "Wean Mass (g)", y = "Density", title = "Weaning Mass")

Mouse weaning data

What is the weaning mass of male mice?

Analytical mean:

Males <- M %>% filter(Sex_f == "Male")
Males %>%
  summarize(mean_mass = mean(WnMass))
## # A tibble: 1 x 1
##   mean_mass
##       <dbl>
## 1      13.8

Likelihood for each observed value

\[Pr\left[Y_i; \mu, \sigma\right] = \frac{1}{\sqrt{2\pi\sigma^{2}}} e^{\frac{-\left(Y_i-\mu\right)^{2}}{2\sigma^{2}}}\]

  • For any \(\mu\) and \(\sigma\) we can calculate the likelihood of that observation
  • Some values of \(\mu\) and \(\sigma\) will maximize the likelihood for that observation

What values of \(\mu\) and \(\sigma\) simultaneously maximize the likelihood for all the observations?

Model Likelihood (\(\mathcal{L}\))

For a set of observations (\(Y_i\)) and hypothesized parameters (\(\Theta\), here \(\mu\) and \(\sigma\)) the model likelihood is the product of the observations’ individual likelihoods:

\[\mathcal{L}\left(\left\{ Y_{i}\right\} _{i=1}^{n};\Theta\right) = \prod_{i=1}^{n}Pr\left[Y_{i}; \Theta\right]\]

\[\log\left(\mathcal{L}\left(\left\{ Y_{i}\right\} _{i=1}^{n};\Theta\right)\right) = \sum_{i=1}^{n} \log\left(Pr\left[Y_{i};\Theta\right]\right)\]

General ML procedure

Search the parameter space of \(\Theta\) (here \(\mu\) and \(\sigma\)) for the values that maxmimize the likelihood. For each possible \(\mu\) and \(\sigma\):

  1. Calculate the likelihood for each observation
  2. Sum the log-likelihoods
  3. Search until log-likelihood is maximized

The analytical mean equals the ML estimate of the mean.

  • Why not just use the analytical solution in all cases?

Searching the parameter space

  1. Grid approximation
    • Slow, coarse, doesn’t scale well
  2. Optimization
    • Harder to set up
    • General solution

Likelihood function

log_lik <- function(x, Y){
  log_liks <- dnorm(Y, mean = x[1], sd = x[2], log = TRUE)
  return(sum(log_liks))
}
  • Pass a vector of x <- c(mu, sigma) and vector of Y values
  • Calculate the likelihoods (probability densities) for each Y given the mean and standard deviation
  • Return the sum of the logs of the likelihoods

Grid approximation

For combinations of \(\mu\) and \(\sigma\), calculate the model likelihood. Pick the largest log-likelihood as the maximum likelihood estimates.

Set up the grid:

n <- 100 # How fine is the grid
mu <- seq(5, 20, length = n) # Vector of mu
sigma <- seq(0.1, 5, length = n) # Vector of sigma

grid_approx <- crossing(mu, sigma) %>% 
  mutate(log_lik = 0)

grid_approx
## # A tibble: 10,000 x 3
##       mu sigma log_lik
##    <dbl> <dbl>   <dbl>
##  1     5 0.1         0
##  2     5 0.149       0
##  3     5 0.199       0
##  4     5 0.248       0
##  5     5 0.298       0
##  6     5 0.347       0
##  7     5 0.397       0
##  8     5 0.446       0
##  9     5 0.496       0
## 10     5 0.545       0
## # ... with 9,990 more rows

Grid approximation

for (ii in 1:nrow(grid_approx)) {
  grid_approx[ii, "log_lik"] <- 
    log_lik(c(grid_approx$mu[ii], grid_approx$sigma[ii]),
            Males$WnMass)
}
  • Iterate through the rows (\(ii\)) of grid_approx
  • For each row, assign the model log-likelihood calculated for that mu and sigma to log_lik

head(grid_approx)
## # A tibble: 6 x 3
##      mu sigma   log_lik
##   <dbl> <dbl>     <dbl>
## 1     5 0.1   -5144773.
## 2     5 0.149 -2301597.
## 3     5 0.199 -1298860.
## 4     5 0.248  -832927.
## 5     5 0.298  -579252.
## 6     5 0.347  -426079.

This approach is coarse, time consuming, and not feasible for fine grids or many parameters.

  • For a 100 X 100 grid, there are 10,000 calculations.
  • If there were 3 parameters, there would be 1,000,000.

Grid approximation

Grid approximation

Grid approximation

On this grid, the maximum likelihood estimate of \(\mu\) is:

grid_approx[which.max(grid_approx$log_lik), ]
##            mu    sigma   log_lik
## 4228 13.78788 2.475758 -2845.812

The analytical estimate is:

mean(Males$WnMass)
## [1] 13.82238

Maximum likelihood via optimization

reltol says to stop when the improvement is \(<10^{-100}\).

  • Default tolerance is \(10^{-8}\), so we will get a very precise estimate.
MLE <- optim(c(13, 1), # Start at mu = 13, sigma = 1
             log_lik,
             Y = Males$WnMass,
             control = list(fnscale = -1,   # Find maximum
                            reltol = 10^-100))

Maximum likelihood via optimization

mean(Males$WnMass)
## [1] 13.82238
print(MLE)
## $par
## [1] 13.822380  2.460218
## 
## $value
## [1] -2845.644
## 
## $counts
## function gradient 
##      111       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

ML considerations

  • glm() uses an optimization routine
  • ML can get stuck in a local maximum (that isn’t the global maximum)
  • Write your own custom likelihood function

Resampling inference of the mean

set.seed(36428)
reps <- 100000
resampled_mean <- numeric(length = reps)
for (i in 1:reps) {
  resampled_mean[i] <- mean(sample(Males$WnMass,
                                   replace = TRUE))
}

Each iteration:

  • Resample Males$WnMass with replacement. Yields a vector equal in length to Males$WnMass but where each value can appear more than once.
  • Calculate the mean of this resampled vector. Assign to the ith position in resampled_mean.

The mean of these means is the estimate of the population mean (via central limit theorem).

Resampling inference of the mean

Resampling inference of the mean

mean(Males$WnMass)
## [1] 13.82238
mean(resampled_mean)
## [1] 13.82238

Bayesian inference of the mean

Ranges of possible maximum likelihood values:

  1. \(\mu\): \(-\infty < \mu < \infty\)
  2. \(\sigma\): \(0 < \sigma < \infty\)

Drawbacks:

  1. \(\mu\) can’t be negative (no negative masses) and probably isn’t very small either (biology)
  2. \(\sigma\) is also probably not huge either

Can we do better? Yes, Bayesian priors.

Prior for the mean

Prior for the mean

Bayesian MCMC sampling

  1. Write your own (not recommended)
  2. WinBUGS / OpenBUGS
  3. JAGS
  4. stan

R, Python, MATLAB, etc. interfaces to samplers #2-4.

In R

  • rstan: Build your own models
  • rethinking: Convenience interface to rstan.
  • brms and rstanarm: High level model syntax

Bayesian model for mean \(\mathcal{N}(14, 5)\)

fm_sd5 <- ulam(
  alist(
    WnMass ~ dnorm(mu, sigma),
    mu ~ dnorm(14, 5),
    sigma ~ dcauchy(0, 3)
  ),
  data = Males_sub
)
## 
## SAMPLING FOR MODEL '87bc108e88ace8f48ed0725b0d6d7d0e' NOW (CHAIN 1).
## Chain 1: Rejecting initial value:
## Chain 1:   Error evaluating the log probability at the initial value.
## Chain 1: Exception: normal_lpdf: Scale parameter is -1.01634, but must be > 0!  (in 'model44cce6d7376_87bc108e88ace8f48ed0725b0d6d7d0e' at line 11)
## 
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
## Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
## Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
## Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
## Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
## Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
## Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
## Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
## Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
## Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
## Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
## Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.038 seconds (Warm-up)
## Chain 1:                0.026 seconds (Sampling)
## Chain 1:                0.064 seconds (Total)
## Chain 1:

Bayesian model for mean \(\mathcal{N}(14, 5)\)

mean(Males_sub$WnMass)
## [1] 13.82238
precis(fm_sd5)
##            mean         sd      5.5%     94.5%    n_eff     Rhat4
## mu    13.818507 0.06727362 13.711583 13.927994 242.0024 1.0038579
## sigma  2.460989 0.04948409  2.382388  2.539553 320.0294 0.9989122

Bayesian model for mean \(\mathcal{N}(14, 10)\)

fm_sd10 <- map2stan(
  alist(
    WnMass ~ dnorm(mu, sigma),
    mu ~ dnorm(14, 10),
    sigma ~ dcauchy(0, 3)
  ),
  data = Males_sub
)
## 
## SAMPLING FOR MODEL '506ccf234967eaf90497b783b32a95ff' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.05 seconds (Warm-up)
## Chain 1:                0.048 seconds (Sampling)
## Chain 1:                0.098 seconds (Total)
## Chain 1:
## Computing WAIC
## Error in .local(fit, data, n, ...) : 
##   There appear to be no linear models here

Bayesian model for mean \(\mathcal{N}(14, 10)\)

mean(Males_sub$WnMass)
## [1] 13.82238
precis(fm_sd10)
##            mean         sd      5.5%     94.5%    n_eff     Rhat4
## mu    13.823756 0.06891462 13.718320 13.929705 601.2564 0.9999429
## sigma  2.463075 0.04867016  2.386445  2.543367 567.7990 1.0016267

Bayesian model for mean \(\mathcal{N}(10, 5)\)

fm_mu10 <- map2stan(
  alist(
    WnMass ~ dnorm(mu, sigma),
    mu ~ dnorm(10, 5),
    sigma ~ dcauchy(0, 3)
  ),
  data = Males_sub
)
## 
## SAMPLING FOR MODEL 'c17be1a859b3f05ea6a9ed4d2290bba4' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.054 seconds (Warm-up)
## Chain 1:                0.05 seconds (Sampling)
## Chain 1:                0.104 seconds (Total)
## Chain 1:
## Computing WAIC
## Error in .local(fit, data, n, ...) : 
##   There appear to be no linear models here

Bayesian model for mean \(\mathcal{N}(10, 5)\)

mean(Males_sub$WnMass)
## [1] 13.82238
nrow(Males)
## [1] 1227
precis(fm_mu10)
##            mean         sd      5.5%     94.5%    n_eff     Rhat4
## mu    13.819822 0.07090982 13.703792 13.933466 605.9309 1.0008821
## sigma  2.464692 0.05389681  2.382107  2.555207 705.0477 0.9994903

Bayesian model for mean \(\mathcal{N}(14, 0.1)\)

fm_sd0.1 <- map2stan(
  alist(
    WnMass ~ dnorm(mu, sigma),
    mu ~ dnorm(14, 0.1),
    sigma ~ dcauchy(0, 3)
  ),
  data = Males_sub
)
## 
## SAMPLING FOR MODEL '8082e8efa66a556437ce4fe9d7803a53' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.054 seconds (Warm-up)
## Chain 1:                0.055 seconds (Sampling)
## Chain 1:                0.109 seconds (Total)
## Chain 1:
## Computing WAIC
## Error in .local(fit, data, n, ...) : 
##   There appear to be no linear models here

Bayesian model for mean \(\mathcal{N}(14, 0.1)\)

mean(Males_sub$WnMass)
## [1] 13.82238
precis(fm_sd0.1)
##            mean         sd      5.5%     94.5%    n_eff     Rhat4
## mu    13.879698 0.05778947 13.787619 13.970305 703.2480 0.9996272
## sigma  2.460888 0.04843463  2.386658  2.539512 802.6935 1.0082159

Bayes priors

  • How much information do you have?
  • Try different priors
    • What is the effect?
  • Explicit model comparison